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COMPUTERIZED METHOD FOR DETERMINATION OF THE LIKELIHOOD OF 
MALIGNANCY FOR PULMONARY NODULES ON LOW-DOSE CT 
[0001] The present invention was made in part with U.S. Government support under USPHS 
Grant CA24806. The U.S. Government has certain rights in this invention. 

BACKGROUND OF THE INVENTION 

Field of the Invention 

[0002] The invention relates generally to a computerized method for determination of the 
likelihood of malignancy for pulmonary nodules on a low-dose CT X-ray device. 
[0003] The present invention also generally relates to computerized techniques for automated 
analysis of digital images, for example, as disclosed in one or more of U.S. Patents 
4,839,807; 4,841,555; 4,851,984; 4,875,165; 4,907,156; 4,918,534; 5,072,384; 5,133,020; 
5,150,292; 5,224,177; 5,289,374; 5,319,549; 5,343,390; 5,359,513; 5,452,367; 5,463,548; 
5,491,627; 5,537,485; 5,598,481; 5,622,171; 5,638,458; 5,657,362; 5,666,434; 5,673,332; 
5,668,888; 5,732,697; 5,740,268; 5,790,690; 5,832,103; 5,873,824; 5,881,124; 5,931,780; 
5,974,165; 5,982,915; 5,984,870; 5,987,345; 6,011,862; 6,058,322; 6,067,373; 6,075,878; 
6,078,680; 6,088,473; 6,112,112; 6,138,045; 6,141,437; 6,185,320; 6,205,348; 6,240,201; 
6,282,305; 6,282,307; 6,317,617 as well as U.S. patent applications 08/173,935; 08/398,307 
(PCT Publication WO 96/27846); 08/536,149; 08/900,189; 09/027,468; 09/141,535; 
09/471,088; 09/692,218; 09/716,335; 09/759,333; 09/760,854; 09/773,636; 09/816,217; 
09/830,562; 09/818,831; 09/842,860; 09/860,574; 60/160,790; 60/176,304; and 60/329,322; 
co-pending applications (listed by attorney docket number) 215807US-730-730-20, 
215752US-730-730-20, 216439US-730-730-20 PROV, and 216504US-730-730-20 PROV; 
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and PCT patent applications PCT/US98/15165; PCT/US98/24933; PCT/US99/03287; 
PCT/US00/41299; PCT/USO 1/00680; PCT/USO 1/0 1478 and PCT/USO 1/0 1479, all of which 
are incorporated herein by reference. 

[0004] The present invention includes use of various technologies referenced and described 
in the above-noted U.S. Patents and Applications, as well as described in the references 
identified in the following LIST OF REFERENCES by the author(s) and year of publication 
and cross-referenced throughout the specification by reference to the respective number, in 
parentheses, of the reference: 
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[0027] The entire contents of each related patent and application listed above and each 
reference listed in the LIST OF REFERENCES, are incorporated herein by reference. 
Discussion of the Background 

[0028] Recently, medical professionals have been able to diagnose lung cancer with the aid 
of computed tomography (CT) imaging systems. CT systems is an X-ray device used to 



produce cross sectional images of organs. For instance, a CT system may be used to produce 
a series of cross sectional images of the human lung. Radiologists are able to examine these 
series of cross sectional images to diagnose pulmonary nodules. The radiologists' 
examinations also diagnose whether these pulmonary nodules are malignant or benign. If a 
radiologist confirms confidently that a pulmonary nodule is benign, further medical 
examination can be avoided. 

[0029] However, diagnosis of pulmonary nodules is a particularly difficult task for 
radiologists. Typically radiologists examine the series of images produced of a human lung 
from a CT system. Based on the radiologist's visual examination, pulmonary nodules are 
identified. This process is particularly time consuming and radiologists are in fairly high 
demand. Further, there is an element of human error where pulmonary nodules are not 
diagnosed from a CT image. In some instances, radiologists are unable, from visual 
examination, to determine if an identified pulmonary nodule is malignant or benign and 
unnecessary further medical examination may be performed. Unnecessary medical 
examination is undesirable for several reasons. One reason is that such unnecessary medical 
examinations are financially costly, which is undesirable for both patients and health care 
providers. Another reason is that further medical examination is often painful for the patient. 
For instance, further medical examination may entail additional X-ray taken of the patient, 
which has adverse side effects. Another reason is that if unnecessary medical examinations 
can be avoided, more patients can be treated by the limited number of CT imaging devices 
and available radiologists. It is also important that the diagnosis of pulmonary nodules is 
accurate, so malignant pulmonary nodules can be diagnosed during the early stages of lung 
cancer. There is a tendency among radiologists to assume that a pulmonary nodule is 
malignant if it is indeterminable whether the pulmonary nodule is benign or malignant. 



However, according to recent findings on low-dose helical CT screenings of lung cancer, 
83% of 605 patients with suspicious pulmonary nodules have benign lesions, whereas there 
are only 105 patients with malignancy. Accordingly, a majority of patients with suspicious 
pulmonary nodule do not need further medical examination after initial screening on a low- 
dose helical CT. However, many patients with benign pulmonary nodule undergo further 
medical examination, because of human error or the indeterminability during initial screening 
to determine that suspicious pulmonary nodules are benign. 

SUMMARY OF THE INVENTION 
[0030] The above-mentioned deficiencies of diagnosis of pulmonary nodules are mitigated 
by the present invention which relates to a method and system for determining if a pulmonary 
nodule is malignant. The method and system the present invention includes the steps of 
obtaining at least one medical image of a pulmonary nodule and determining if the 
pulmonary nodule is malignant based on the examination of seven patient or image features. 
[0031] In embodiments of the present invention, the patient features comprise the sex of 
the patient. In embodiments of the present invention, the image features of a pulmonary 
nodule are extracted from a CT image of the pulmonary nodule. In embodiments of the 
present invention, the image features comprise effective diameter of the pulmonary nodule, 
contrast of the pulmonary nodule, overlap measure of two gray-level histograms from the 
inside and outside regions of a segmented nodule of the image, overlap measure of two gray- 
level histograms from inside and outside regions of a segmented nodule of a edge gradient of 
the image, the radial gradient index for an inside region of a segmented nodule of an image, 
and peak value of a histogram for an inside region of a segmented nodule of an edge gradient 
of an image. 

[0032] Out of the features that are analyzed to determine if a pulmonary nodule is 



malignant, seven features are selected to optimize the accuracy of the diagnosis of a 
pulmonary nodule. Through a unique sampling scheme, different embodiments of the 
present invention utilize different combinations of features to optimize the accuracy of the 
method of the present invention. 

BRIEF DESCRIPTION OF THE DRAWINGS 
[0033] A more complete appreciation of the invention and many of the attendant advantages 
thereof will be readily obtained as the same becomes better understood by reference to the 
following detailed description when considered in connection with the accompanying 
drawings wherein: 

[0034] Figure 1 is a pictorial view of a CT imaging system. 

[0035] Figure 2 is a block schematic diagram of the system illustrated in Figure 1 . 

[0036] Figure 3 illustrates three exemplary malignant nodules (a), (b), (c) and three benign 

nodules (d), (e), (f) on low-dose helical CT. 

[0037] Figure 4 is an exemplary gray-level histogram for inside and outside regions of the 
segmented nodule for the malignant nodule in Fig. 3 (a) and the benign nodule in Fig. 3 (f). 
[0038] Figure 5 illustrates the relationship between effective diameter and peak value of the 
histogram for the inside region of the segmented nodule on LDCT image. 
[0039] Figure 6 is an exemplary schematic illustration for the outlines of two segmented 
regions (solid curves), a set of radial lines (dashed lines), and the intersection points (solid 
circles) between the outlines and the radial lines. 

[0040] Figure 7 is an exemplary illustration of the nodule segmentation (see Fig. 3(a)), with 
original LDCT image (a), edge candidate points (b), and segmentation result (c). 
[0041] Figure 8 is a schematic illustration for two adjacent radial lines (solid lines), four 
edge points on the radial lines (solid circles A, B, C, and D), and a virtual edge point (open 



circle E) that were used in the determination of an optimal outline for a nodule by use of a 
dynamic programming technique. 

[0042] Figure 9 illustrates extracted nodule regions by the automated nodule segmentation 
for malignant nodules (a), (b), (c) and benign nodules (d), (e), (f). 

[0043] Figure 10 illustrates ROC curves obtained by use of the LDA with seven features for 
distinguishing benign nodule from malignant one. 

[0044] Figure 1 1 illustrates distributions of LDA output indicating the likelihood of 
malignancy obtained with (a) single slice method and (b) multiple slice method. 
[0045] Figure 12 illustrates a schematic illustration of a computer system for the 
computerized analysis of the likelihood of malignancy in pulmonary nodules. 

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS 
MATERIALS AND METHODS 

[0046] Referring to Figures 1 and 2, an exemplary computed tomography (CT) imaging 
system 10 is shown as including a gantry 12. Gantry 12 has an X-ray source 14 that projects 
a beam of X-rays 16 toward a detector array 18 on the opposite side of gantry 12. Detector 
array 18 is formed by detector elements 20 which together sense the projected X-rays that 
pass through a medical patient 22. Each detector element 20 produces an electrical signal 
that represents the intensity of an impinging X-ray beam and hence the attenuation of the 
beam as it passes through patient 22. During a scan to acquire X-ray projection data, gantry 
12 and the components mounted thereon rotate about a center of rotation 24. 
[0047] Rotation of gantry 12 and the operation of X-ray source 14 are governed by a 
control mechanism 26 of CT system 10. Control mechanism 26 includes an X-ray controller 
28 that provides power and timing signals to X-ray source 14 and a gantry motor controller 
30 that controls the rotational speed and position of gantry 12. A data acquisition system 



(DAS) 32 in control mechanism 26 samples analog data from detector elements 20 and 
converts the data to digital signals for subsequent processing. An image reconstructor 34 
receives sampled and digitized X-ray data from DAS 32 and performs high speed image 
reconstruction. The reconstructed image is applied as an input to a computer 36 which stores 
the image in a mass storage device 38. 

[0048] Computer 36 also receives and supplies signals via a user interface, or graphical 
user interface (GUI). Specifically computer receives commands and scanning parameters 
from an operator via console 40 that has a keyboard and a mouse (not shown). An associated 
cathode ray tube display 42 allows the operator to observe the reconstructed image and other 
data from computer 36. The operator supplied commands and parameters are used by 
computer 36 to provide control signals and information to X-ray controller 28, gantry motor 
controller 30, DAS 32, and table motor controller 44. 

[0049] Figure 3 illustrates three exemplary malignant and three exemplary benign solitary 
pulmonary nodules, which are located at the center of regions of interest (ROIs). The center 
of the ROI (80 x 80 matrix size, 12-bit gray-scale) corresponds to the central location of a 
nodule. 

[0050] The Inventors constructed a database consisting of seventy-six primary lung cancers 
and four hundred thirteen benign nodules, which were obtained from a lung cancer screening 
on 7,847 screenees with LDCT (25 - 50 mAs, 10 mm collimation, pitch 2, 10mm 
reconstruction interval). Primary lung cancers were proved by pathologic diagnosis, and 
benign nodules were confirmed by diagnostic follow-up examinations or surgery. The size of 
the nodules was less than 30 mm. Some nodules were recognized over a few slices in LDCT 
images. Seventy-six primary lung cancers consisted of twenty-two nodules with a single 
slice LDCT image, thirty-seven nodules with two slice LDCT images, and seventeen nodules 



with three slice LDCT images, thus yielding a total of 147 slices of malignant nodules 
(1 x 22 + 2 x 37 + 3 x 17). Four hundred thirteen benign nodules consisted of two hundred 
sixty-five nodules with a single slice LDCT image, one hundred thirty-three nodules with 
two slice LDCT images, and fifteen nodules with three slice LDCT images, which provided a 
total of 576 slices of benign nodules (1 x 265 + 2 X 133 + 3 X 15). 
[0051] Nodules were segmented from LDCT images by use of a dynamic programming 
technique, as will be discussed below. Forty-three features for lung nodules on a low-dose 
helical CT (LDCT) were extracted and examined by the Inventors. In addition to two clinical 
parameters (age and sex), forty-one image features were determined by use of the outline of 
the nodule, and other image information from inside and outside regions of the segmented 
nodule. The width of the outside region was 5 mm, which was determined empirically. 
Feature values based on image analysis were determined by use of the LDCT image and the 
corresponding edge gradient image, which was obtained by use of a Sobel filter. The matrix 
size of the Sobel filter was 5x5 pixels, which appeared to provide nodule edges 
conspicuously in the edge gradient image. The forty-one image features included seven 
features based on the outline, and two features based on linear patterns included in the LDCT 
image, four features based on edge orientation of the edge gradient image, four features based 
on gray-level distribution of the LDCT and the edge gradient images (4x2 = 8), and ten 
features based on the relationship between the two histograms in the inside and outside 
regions of the segmented nodule for LDCT and the edge gradient images (10x2 = 20). 
[0052] The effective diameter of a nodule outline is defined by the diameter of a circle with 
the same area as that of the outline. The degree of circularity is defined by the fraction of the 
overlap area of the circle with the nodule outline. The degree of ellipticity is defined in the 
same manner as the degree of circularity, by use of an ellipse instead of a circle fitted to the 
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nodule outline. The degree of irregularity is defined by 1 minus the perimeter of the circle 
divided by the length of the nodule outlines whereas the degree of elliptical irregularity is 
computed by use of the perimeter of the fitted ellipse. The root-mean-square variation and the 
first moment of the power spectrum are obtained by use of Fourier transformation of the 
distance from the nodule outline to the fitted ellipse. 

[0053] The magnitude of the line pattern components for both the inside and outside regions 
of the segmented nodule is determined by use of a line enhancement filter, in a direction 
within 45 degrees of the radial line from the center of the ROI. The radial gradient index is 
computed by the mean absolute value of the radial edge gradient projected along the radial 
direction for both the inside and outside regions of the segmented nodule. The tangential 
gradient index is also computed by the mean absolute value of the tangential edge gradient 
projected along the tangential direction for both the inside and outside regions of the 
segmented nodule. The mean pixel value and the relative standard deviation are defined for 
both the inside and outside regions of the segmented nodule. 

[0054] The overlap measures between two histograms are defined by the overlap area of 
gray-level histograms between the inside and outside regions of the segmented nodule. In 
addition, the difference of the mean pixel value, the pixel value at the peak, the peak value, 
the full width at half maximum (FWHM), and the full width at tenth maximum for gray-level 
histograms for both the inside and outside regions of the segmented nodules were used as 
features. The contrast of a nodule are defined by the difference in the mean pixel values 
between the 7 x 7 pixel area in the center of the ROI and the outside region of the segmented 
nodule. 

[0055] Figure 4 shows gray-level histograms of two exemplary nodules ((a): malignant in 
Figure 3(a), (b): benign in Figure 3(f) for the inside and outside regions of the segmented 
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nodule on the LDCT image. 

[0056] Figure 5 shows the relationship between the effective diameter and the peak value on 
the LDCT image for benign and malignant nodules. Although the distributions in Figure 5 
indicate a considerable overlap between benign and malignant nodules, the result appears to 
indicate the possibility of distinction between benign and malignant nodules. The malignant 
nodules tend to have a large effective diameter and small peak value, whereas benign nodules 
tend to have large peak values even with a relatively small effective diameter. This result 
seems to correspond to the observation in Figure 4, i.e., malignant nodules generally have a 
lower peak and wider width in histograms than do benign nodules, when the effective 
diameter is large. 

[0057] A linear discriminant analysis (LDA) was used as classifier of features. The 
Inventors realized that computation of all combinations of some features selected from all 
forty-three features is not practical Accordingly, the Inventors employed Wilks lambda. 
Wilks lamda is the ratio of within-group variance to the total variance. The F -value, which is 
a cost function based on Wilks lambda, was used to find an initial selection for the number of 
features and their combination, as a seed, with LDA. 

[0058] The Inventors examined combinations of features by an iterative procedure, where 
features are added and removed one-by-one by use of two thresholds on the F- value, one for 
removal and another for addition. In this examination, the same threshold for removal and 
addition was employed. The number of selected features depends on this threshold. For 
example, when the threshold level decreased from 4 to 3 to 2 to 1, the number of selected 
features increased from 3 to 7 to 7 to 12, respectively, with corresponding Az values of 
0.822, 0.823, 0.823, and 0.815. Therefore, the optimal number of features was determined to 
be seven, because of the highest Az value. 
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[0059] The Inventors experimented with many different combinations of seven features. The 
features were repeatedly tried for appending and deleting the preselected seven features. The 
effective diameter and the contrast of the nodule in the LDCT image were always kept 
among seven features because these two features are particularly important features for 
diagnosis by radiologists. The determined optimal combination consisted of 
(1) the effective diameter, (2) the contrast of the segmented nodule on the LDCT image, 
(3) sex, (4) the overlap measure of two gray-level histograms for the inside and outside 
regions of the segmented nodule on the LDCT image, (5) the overlap measure of two gray- 
level histograms for the inside and outside regions of the segmented nodule on the edge 
gradient image, and (6) the radial gradient index for the inside region of the segmented 
nodule on the LDCT image, (7) the peak value of the histogram for the inside region of the 
segmented nodule on the edge gradient image. This combination provided an Az value of 
0.828. 

[0060] The Inventors used a round-robin (leave-one-out) test for training and testing of the 
LDA. Training was carried out with all cases except one in the database, and the one case not 
used for training was applied for testing with the trained LDA. This procedure was repeated 
until every case in the database was used once. LDA separates benign from malignant 
nodules by use of a hyperplane. The output value of LDA represents the distance of either a 
benign or a malignant nodule from the hyperplane. The output value of the LDA is 
normalized as the likelihood of malignancy such that the maximum distance in one direction 
for a benign nodule and the maximum distance in another direction for a malignant nodule 
from the hyperplane correspond to 0 and 1.0, respectively. The output value was considered 
to indicate more malignancy, if its likelihood of malignancy was far from the hyperplane 
(i.e., close to 1.0), and vice versa. In addition, it was considered to be "less definitive" when 
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the value of the likelihood of malignancy was close to the hyperplane. The performance of 
the automated computerized scheme was evaluated by use of ROC analysis. The area under 
the ROC curve, Az, was used as a measure of performance. The LABROC4 program was 
used for obtaining the ROC curves. In addition, the statistical significance was determined by 
the bivariate test in ROCKIT. 

[0061] The likelihood of malignancy for a pulmonary nodule in CT images can be 
determined by two different methods. One is based on single slice LDCT images. Another is 
based on multiple slices, which include some parts of nodules in LDCT images. In a method 
by use of single slice LDCT images, only one slice including the largest effective diameter of 
a nodule was used for determination of the likelihood of malignancy. In another method with 
the use of multiple slices of nodule images, four different techniques can be used for data 
integration of all slices of nodule images in order to determine the likelihood of malignancy 
representing a nodule appeared in a few slices. First, the distances from the hyperplane for 
nodules in all slices were determined independently. Then, the likelihood of malignancy was 
determined from (1) the largest distance among those distances for all slices with the nodule, 
(2) the shortest distance among those distances for all slices with the nodule, (3) the mean 
distance of those distances for all slices with the nodule, (4) the weighted mean distance of 
those distances for all slices with the nodule by use of the effective diameter of the nodule at 
each slice as the weighting factor. 

[0062] Figure 10 shows two ROC curves obtained with the single slice and the multiple 
slices methods for distinguishing between benign and malignant nodules, by use of our 
automated computerized scheme. Table 1 shows a comparison of Az values obtained with the 
single slice method for several different combinations of features which provided Az values 
greater than or equal to 0.825, and also the corresponding Az values obtained with the 
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multiple slices method. The largest Az value obtained by the computerized scheme for single 
slice LDCT images in distinguishing benign from malignant nodules was 0.828. However, 
the Az value was improved to 0.846 by use of multiple slice LDCT images (P= 0.03). 
[0063] Figures 1 1(a) and (b) show the distributions of the LDA output obtained with single 
slice LDCT images and multiple slice LDCT images, respectively. Figure 1 1(b) indicates a 
better separation in distinguishing between benign and malignant nodules than does 
Figure 1 1(a), which is consistent with the result obtained with Az values. 
[0064] Table 1. 
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[0065] Common features: (1) effective diameter, (2) contrast of the segmented nodule on the 
CT image, (3) sex, (4) overlap measure of two gray-level histograms for the inside and 
outside regions of the segmented nodule on the LDCT image, (5) overlap measure of two 
gray-level histograms for the inside and outside regions of the segmented nodule on the edge 
gradient image. 

[0066] Other features: (6) radial gradient index for the inside region of the segmented nodule 
on the LDCT image, (7) peak value of the histogram for the inside region of the segmented 
nodule on the edge gradient image, (8) pixel value at the peak of the histogram for the inside 
region of the segmented nodule on the edge gradient image, (9) pixel value at the peak of the 
histogram for the inside region of the segmented nodule on the LDCT image, (10) full width 
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at half maximum of the histogram for the inside region of the segmented nodule on the 
LDCT image. 

[0067] A mask ROI is a small binary image, in which a pixel value of 1 or 0 indicates a pixel 
inside or outside a lung region, respectively. The mask ROI contains important information 
for the segmentation of nodules and for the determination of the features of nodules. In order 
to obtain a mask ROI for lung regions, said lung regions are segmented from background for 
each section of the LDCT scan by use of a thresholding technique. A pixel with a CT value 
between -400 HU and -1000 HU is considered as being located inside the lung regions and 
was thus assigned a value of 1; otherwise, the pixel is considered as belonging to background 
and was thus assigned a value of 0. If a nodule is connected to the pleura, the nodule would 
be excluded from the lung regions because the gray-scale values for the pixels inside the 
nodule would be out of the range between -400 HU and -1000 HU. A rolling-ball algorithm 
is employed along the outlines of lung regions to compensate for this type of segmentation 
error. After the lung regions were segmented from the background in the entire section, a 
mask ROI of 80 x 80 pixels is determined at the location of a nodule from the segmented 
binary image. The nodule segmentation technique is applied only to those pixels inside the 
lung regions. 

[0068] In order to segment a nodule from background, a preprocessing step is utilized for 
correction of the background trend included in an original ROI. The background trend in the 
ROI was represented by a two-dimensional (surface) linear function, and the three 
coefficients of the linear function were determined by a least square method. The estimated 
surface function was then subtracted from the original ROI to provide a background-trend 
corrected ROI. Only the pixels inside the lung regions are employed for the determination of 
the coefficients of the linear function. Next, a multiple-thresholding technique is applied to 
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the background-trend corrected ROI for creation of a set of binary images. The initial 
threshold is selected to be the gray-scale value of the pixel at the center of the ROI, and the 
subsequent thresholds are gradually decreased with an increment of 5 HU until all the pixels 
in the lung regions are segmented as object pixels (i.e., with a value of 1 in the segmented 
image). The increment of 5 HU is empirically determined in this study. For each of the binary 
images, the contour of the nodule region including the center of the ROI is delineated, and 
the intersection points between the contour and a set of evenly distributed radial lines 
pointing outward from the center of the ROI are determined. In the exemplary illustration of 
Figure 7, there are a total of 60 such radial lines with an angle of 6 degrees between two 
adjacent lines. 

[0069] Figure 6 shows schematically the contours of two regions (solid curves), sixteen 
radial lines (dashed lines), and thirty-two intersection points (solid circles) between the 
contours and the radial lines. There are many contours and intersection points in the process 
of nodule segmentation. Because the intersection points A and B are located far away from 
each other, the gradient of pixel values from A to B has a small magnitude, which implies a 
slow change of pixel values from A to B. Therefore, it is unlikely that the intersection points 
A and B are on a clear edge. On the other hand, because the intersection points C and D are 
close to each other, the magnitude of the gradient of pixel values from C to D is large, which 
implies that the intersection points C and D would be located on an edge. Therefore, if the 
distance between two consecutive intersection points on a radial line is smaller than 1.5 
pixels (approximately 0.9 mm), the two points are considered to belong to the same edge. If 
more than three consecutive intersection points on a radial line satisfied the above condition, 
an edge point is found, and its location is defined as that of the middle point of those 
consecutive intersection points. The larger the number of such consecutive intersection 
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points, the more likely it is that these points constituted a significant edge point. Therefore, 
the number of the intersection points on an edge point provide important information for the 
edge point, and is employed for the segmentation of nodules by use of a dynamic 
programming technique. 

[0070] Figure 7 shows an example for (a) the original image with a nodule at the center, (b) 
the original image with the detected edge points, and (c) the original image with the 
delineated outline for the nodule by use of a dynamic programming technique. Figure 7 
illustrates that the pixels within the nodule include a large variation in the gray-scale values, 
and the pixels around the nodule contain a complex background such as vessels. Therefore, it 
is difficult to segment this nodule by use of a conventional thresholding technique. Although 
most of the edge points along the outline of the nodule were detected in Fig. 7(b) with the 
technique described above, there are still a small number of missing edge points along the 
outline of the nodule, and some erroneous "edge" points caused by the complex background. 
In order to employ a dynamic programming technique to determine a reliable outline for a 
nodule, the outline of the nodule as a series of 60 outline nodes is defined, each of which is 
located on a radial line. For those radial lines with no edge point, a virtual interpolated edge 
point is created based on the information on edge points on the adjacent radial lines. Radial 
lines with multiple edge points are selected as outline node by a dynamic programming 
technique. 

[0071] Dynamic programming is a technique for solving combinatorial optimization 
problems, in which the solution space is so large that the conventional optimization 
techniques can not provide an optimal solution by enumerating and comparing each of all 
possible solutions. For example, if there are two edge points on each of 60 radial lines, the 
number of all possible solutions (outlines) is 2 60 , which is such a large number that it is 
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impossible to find an optimal outline by comparing each of all possible outlines. However, 
with a dynamic programming technique, it is possible to determine an optimal outline (in 
terms of a cost function) among all possible ones by use of a multiple- stage decision process. 
The problem of determining an optimal outline with 60 nodes is divided into a problem of 60 
stages, each of which represented an optimal "simple" decision made in the process when 
proceeding from a previous radial line to a current one. This decomposition is possible 
because the optimal solution at stage J depended only on the optimal solution at stage (J-l) 
and the optimal simple decision from stage (J-l) to stage J, and does not depend on the 
optimal solutions at earlier stages (J-2), (J-3), 1. 

[0072] Figure 8 shows two adjacent radial lines (solid lines), four edge points on the radial 
lines (solid circles A', B', C, and D') ? and a virtual interpolated point (open circle E') that 
was used in the determination of an optimal outline for a nodule by use of a dynamic 
programming technique. O' was the center of an ROI containing a nodule. For the purposes 
of this discussion, assume that the two partial optimal outlines (including (J-l) nodes) 
starting from the edge points on the first radial line to the edge points A' and B' on the radial 
line (J-l) were known. Also known are the costs of the two partial optimal outlines to the 
edge points A' and B', namely, Total cost(A') and Total cost(B'). With a dynamic 
programming technique, it is straightforward to determine the costs for two partial optimal 
outlines of J nodes starting from the edge points on the first radial line to the edge points C 
and D' on the radial line J, For example, when Total cost(A') + Local cost(A',C) was 
smaller than Total cost(B') + Local cost(B 7 ,C') ? then the partial optimal outline to C 
(including J nodes) would consist of that to A' (including (J-l) nodes) and the edge point C\ 
and the cost of the partial optimal outline to C was defined by 

Total cost(C') = Total cost(A') + Local cost(A',C). (1) 
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Here, Local cost(A',C) represents a local cost function from the edge point A' to C 
Similarly, the partial optimal outline to D' and its associated cost could be determined. This 
process constituted a stage of a dynamic programming technique, and was recurred until J 
was equal to 60. At the final stage (J=60), if Total cost(C') was smaller than Total cost(D'), 
then the optimal outline to C (including 60 nodes) is considered to be better than that to D', 
and the edge point C was considered to be a node on the optimal outline. According to Eq. 
(1), it is also apparent that the edge point A' on the radial line (J-l) was another node on the 
optimal outline. Similarly, from the node A', an edge point on the radial line (J-2) could be 
obtained as a third node on the optimal outline. Repeating this backward tracing process for 
60 times, all the nodes on the optimal outline for a nodule could be determined. Therefore, a 
dynamic programming technique is composed of two processes, a forward process for the 
cost calculation and a backward process for the tracing of nodes on the optimal outline. 
[0073] The local cost function Local cost(A',C) generally consisted of two components, 
i.e., an internal cost function and an external cost function, 

Local cost(A',C) = W x Int cost(A',C) + Ext cost(C'), (2) 
where Int cost(A',C) and Ext cost(C') are the internal cost function and the external cost 
function, respectively, and W is a constant that makes the range of values for the two cost 
functions comparable, as will be described later. In this study, the internal cost function Int 
cost(A',C) was defined as a normalized distance between A' and C, 

Int cost(A',C) = 2 x dist(A',C) / (dist(0',A') + dist(0',C')), 
where the functions dist(A',C), dist(0',A'), and dist(0',C) represent the distances between 
A' and C, O' and A', and O' and C, respectively. For a large nodule, the normalization 
factor "1 / (dist(0',A') + dist(0',C'))" reduced the internal cost function Int cost(A',C) 
caused by the difference in locations between two edge points A' and C. That is to say, a 



20 



relatively large difference in locations for two nodes was tolerable for a large nodule. When 
dist(0',A') is equal to dist(0',C')» the internal cost function Int cost(A',C) reached its 
minimum value, 2 x Dist(A',F') / Dist(0',A') = 2 x sin(6°/2), which is approximately equal 
to 0.1. The smaller the internal cost, the smoother the outline at an outline point. 
[0074] The external cost function for an edge point C is defined as the negative value of the 
number of intersection points on it, 

Ext cost(C') = - (number of contour points on C). 
As discussed, the number of intersection points on an edge point indicates the likelihood that 
the edge point is a true edge point; therefore, a strong edge point provides a small external 
cost function. It should be noted that the external cost for an edge point does not depend on 
the edge points on a previous radial line. Using the local cost function defined above, the 
optimal (i.e., minimum cost) outline for a nodule determined by use of the dynamic 
programming technique would be a smooth curve located at strong edge points. The constant 
W in Eq. (2) is another important factor to be determined. In this study, the 
constant W is assigned a value of 1 10, which was approximately equal to the absolute value 
of the ratio of the mean external cost (approximately -22) to the mean internal cost 
(approximately 0.2) for the edge points on the outlines of 40 randomly selected nodules. This 
value of 1 10 for W performed very well for the database used in this study. 
[0075] When there was no edge point on the current radial line, a virtual interpolated edge 
point was created with a relatively large penalty value for the local cost function. In fact, 
even for those radial lines with multiple edge points such as the current radial line with the 
edge points C and D' in Figure 8, a virtual edge point E' was still created because the costs 
of using the edge points C and D' would be so large that even the use of the virtual edge 
point E' with a penalty was preferable. For example, if the edge points C and D' were 
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located too far away from the edge points A' and B', then the cost of using C and D' would 
be very large (because of the internal cost function), and the virtual edge point E' would be a 
preferable choice over the edge points C and D'. In order to determine the location of a 
virtual edge point on the current radial line J, an edge point (for example, the edge point A' 
in Figure 8) on the previous radial line (J-l) was first selected if the cost of the partial 
optimal outline to it was smaller than those of the partial optimal outlines to all other edge 
points. The location of the virtual edge point (for example, the virtual edge point E' in Figure 
1 1) is then determined on the current radial line J such that the distance (dist(0',E')) between 
it and the center of the ROI was equal to that (dist(0',A')) between the edge point selected 
above and the center of the ROI. Based on Eq. (2), the penalty cost function for the virtual 
edge point E' in Figure 8 should satisfy the following relationship: 

Local cost(A',E') = W x Int cost(A',E') + P > W x Int cost(A',E') = 110 x 0.1 = 11, 
where P is a positive penalty constant added to Local cost(A',E') because of the lack of a 
significant edge at E'. This inequality shows that the cost function for a virtual edge point 
should be larger than 1 1. In this study, we empirically determined a value of 20 as the penalty 
cost function for a virtual edge point. 

[0076] Finally, for an edge point on the first radial line, only the external cost function was 
employed as the total cost for it, because it is the starting point of an outline. Another 
problem is that there may be a large discontinuity between the first node and the last (60th) 
node on the optimal outline determined by use of the dynamic programming technique. To 
overcome this problem, the dynamic programming algorithm recurred a total of 120 times 
sequentially for the radial lines 1, 2, 3, 59, 60, 1, 2, 3, 59, 60. Only the results for the 
last 60 recurrence were employed for the determination of the optimal outline for a nodule. 
Figure 7-(c) shows the segmentation result for the nodule in Fig. 7-(a) by use of the dynamic 

22 



programming technique, 

[0077] Without the creation of the virtual edge points, either (1) the dynamic programming 
technique would terminate early at a radial line on which no edge point is detected, or (2) the 
outline of a nodule delineated by the dynamic programming technique would be attracted to 
"erroneous" edge points when there is no "correct" edge point detected at some radial lines. 
Therefore, it is important to employ such virtual edge points in the process of delineating the 
outline of a nodule by use of the dynamic programming technique. 
[0078] Figure 9 shows the six segmented nodule regions (see Figure 3). The automated 
segmentation technique provided an approximate region, which is adequate for the 
subsequent analysis. 

[0079] This invention conveniently may be implemented using a conventional general 
purpose computer or micro-processor programmed according to the teachings of the present 
invention, as will be apparent to those skilled in the computer art. Appropriate software can 
readily be prepared by programmers of ordinary skill based on the teachings of the present 
disclosure, as will be apparent to those skilled in the software art. 
[0080] Figure 12 is a schematic illustration of a computer system for the computerized 
analysis of the likelihood of malignancy in pulmonary nodules. A computer 100 implements 
the method of the present invention, wherein the computer housing 102 houses a 
motherboard 104 which contains a CPU 106, memory 108 (e.g., DRAM, ROM, EPROM, 
EEPROM, SRAM, SDRAM, and Flash RAM), and other optional special purpose logic 
devices (e.g., ASICs) or configurable logic devices (e.g., GAL and reprogrammable FPGA). 
The computer 100 also includes plural input devices, (e.g., a keyboard 122 and mouse 124), 
and a display card 1 10 for controlling monitor 120. In addition, the computer 100 further 
includes a floppy disk drive 1 14; other removable media devices (e.g., compact disc 119, 
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tape, and removable magneto-optical media (not shown)); and a hard disk 1 12, or other fixed, 
high density media drives, connected using an appropriate device bus (e.g., a SCSI bus, an 
Enhanced IDE bus, or a Ultra DMA bus). Also connected to the same device bus or another 
device bus, the computer 100 may additionally include a compact disc reader 1 18, a compact 
disc reader/writer unit (not shown) or a compact disc jukebox (not shown). Although 
compact disc 1 19 is shown in a CD caddy, the compact disc 1 19 can be inserted directly into 
CD-ROM drives which do not require caddies. 

[0081] As stated above, the system includes at least one computer readable medium. 
Examples of computer readable media are compact discs 119, hard disks 1 12, floppy disks, 
tape, magneto-optical disks, PROMs (EPROM, EEPROM, Flash EPROM), DRAM, SRAM, 
SDRAM, etc. Stored on any one or on a combination of computer readable media, the 
present invention includes software for controlling both the hardware of the computer 100 
and for enabling the computer 100 to interact with a human user. Such software may include, 
but is not limited to, device drivers, operating systems and user applications, such as 
development tools. Such computer readable media further includes the computer program 
product of the present invention for performing the inventive method of disclosed above. 
The computer code devices of the present invention can be any interpreted or executable 
code mechanism, including but not limited to scripts, interpreters, dynamic link libraries, 
Java classes, and complete executable programs. Moreover, parts of the processing of the 
present invention may be distributed for better performance, reliability, and/or cost. For 
example, an outline or image may be selected on a first computer and sent to a second 
computer for remote diagnosis. 

[0082] The invention may also be implemented by the preparation of application specific 
integrated circuits or by interconnecting an appropriate network of conventional component 
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circuits, as will be readily apparent to those skilled in the art. 

[0083] Numerous modifications and variations of the present invention are possible in light 
of the above teachings. It is therefore to be understood that within the scope of the appended 
claims, the invention may be practiced otherwise than as specifically described herein. 
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